Data Visualisation with ggplot2

Data Carpentry for Social Sciences and Humanities

2024-10-01

Why ggplot2?

…because these are ‘base’ plots

plot(number_items ~ no_membrs, 
     interviews_plotting, 
     col = "blue")

boxplot(rooms ~ village, 
        interviews_plotting, 
        col = c("blue", "green", "red"))

Why ggplot2?

…and these are ggplots 😎

interviews_plotting %>% 
   ggplot(aes(x = no_membrs, y = number_items, color = village)) +
      geom_count() +
      theme_bw() +
      labs(x = "Number of items", 
           y = "Number of members in a household")

interviews_plotting %>%
  ggplot(aes(x = village, y = rooms, fill = village)) +
    geom_violin() +
    theme_minimal() +
    theme(legend.position = "none",
        panel.grid.major.x = element_blank())

ggplot2

ggplot2 is a package (included in tidyverse) for creating highly customisable plots that are built step-by-step by adding layers.

The separation of a plot into layers allows a high degree of flexibility with minimal effort.

<DATA> %>%
    ggplot(aes(<MAPPINGS>)) +
    <GEOM_FUNCTION>() +
    <CUSTOMISATION>

A fuzzy monster in a beret and scarf, critiquing their own column graph on a canvas in front of them while other assistant monsters (also in berets) carry over boxes full of elements that can be used to customize a graph (like themes and geometric shapes). In the background is a wall with framed data visualizations. Stylized text reads 'ggplot2: build a data masterpiece.'

Artwork by [@allison_horst](https://twitter.com/allison_horst)

Data Visualisation Exercises

Exercise 1

6 mins

Create a new code chunk with the label fig-rooms-scatter.

Create a scatter plot of rooms by village with the respondant_wall_type showing in different colours.

Does this seem like a good way to display the relationship between these variables?

What other kinds of plots might you use to show this type of data?

06:00

Exercise 1: Solution

interviews_plotting %>%
    ggplot(aes(x = village, y = rooms, colour = respondent_wall_type)) +
    geom_point() +
    theme_classic() +
    scale_fill_viridis_d() # add colourblind-friendly palette

interviews_plotting %>%
    ggplot(aes(x = village, y = rooms, colour = respondent_wall_type)) +
    geom_jitter(width = 0.2, height = 0.2) +
    theme_classic() +
    scale_fill_viridis_d() # add colourblind-friendly palette

Captioning

Now that we have created the plot, we can also create a caption using the fig-cap chunk option.

```{r}
#| label: fig-rooms-scatter
#| fig-cap: "This plot shows the relationship between the variables room and village, but doesn't do a very good job at it."

interviews_plotting %>%
    ggplot(aes(x = village, y = rooms, colour = respondent_wall_type)) +
    geom_point() +
    theme_classic() +
    scale_fill_viridis_d() # add colourblind-friendly palette
```

Figure 1: This plot shows the relationship between the variables room and village, but doesn’t do a very good job at it.

Exercise 2

4 mins

Boxplots are useful summaries, but hide the shape of the distribution. For example, if the distribution is bimodal, we would not see it in a boxplot.

Replace the box plot with a violin plot
see geom_violin()

04:00

Exercise 2: Solution

interviews_plotting %>%
  ggplot(aes(x = respondent_wall_type, y = rooms)) +
  geom_violin() +
  geom_jitter(alpha = 0.5, color = "tomato")

Exercise 3

8 mins

Create a bar plot showing the proportion of respondents in each village who are or are not part of an irrigation association (memb_assoc).

Include only respondents who answered that question in the calculations and plot.

Which village had the lowest proportion of respondents in an irrigation association?

08:00

Exercise 3

8 mins

Create a bar plot showing the proportion of respondents in each village who are or are not part of an irrigation association (memb_assoc).

Include only respondents who answered that question in the calculations and plot.

Which village had the lowest proportion of respondents in an irrigation association?

08:00

Hint

percent_memb_assoc <- interviews_plotting %>%
  filter(!is.na(memb_assoc)) %>%
  count(village, memb_assoc) %>%
  group_by(village) %>%
  mutate(percent = (n / sum(n)) * 100) %>%
  ungroup()

Exercise 3: Solution

percent_memb_assoc <- interviews_plotting %>%
  filter(!is.na(memb_assoc)) %>%
  count(village, memb_assoc) %>%
  group_by(village) %>%
  mutate(percent = (n / sum(n)) * 100) %>%
  ungroup()

percent_memb_assoc %>%
   ggplot(aes(x = village, y = percent, fill = memb_assoc)) +
    geom_bar(stat = "identity", position = "dodge")

Exercise 4

4 mins

Experiment with at least two different themes. Build the previous plot using each of those themes.

Which do you like best?

04:00

Hint

theme_minimal
theme_void
theme_classic

theme_dark
theme_grey
theme_light

Exercise 4: Solution

percent_items %>%
    ggplot(aes(x = village, y = percent)) +
    geom_bar(stat = "identity", position = "dodge") +
    facet_wrap(~ items) +
    theme_bw() +
    theme(panel.grid = element_blank())

GIPHY